Sketchy Inference: Towards Streaming LDA

نویسندگان

  • Jean-Baptiste Tristan
  • Michael L. Wick
  • Joseph Tassarotti
چکیده

Recent developments in inference algorithms based on stochastic Expectationmaximization or stochastic cellular automata (SCA) have made it possible to employ a variety of randomized data structures that are unavailable to the dominant inference methods in the Bayesian toolkit, including collapsed Gibbs sampling and stochastic variational inference (SVI). Equipped with this recent capability, we make progress towards a true streaming inference algorithm for LDA that makes novel use of these random data-structures. We mean “true” in the sense that the algorithm avoids entirely the need for pre-computation, which many current “streaming” variants of latent Dirichlet allocation (LDA) must perform. We find that despite using various randomized data-structures to represent the sufficient statistics, the inference algorithm converges to similar perplexities as more conventional LDA while producing equally interpretable topics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Latent Dirichlet Allocation with Infinite Vocabulary

Topic models based on latent Dirichlet allocation (LDA) assume a predefined vocabulary. This is reasonable in batch settings but not reasonable for streaming and online settings. To address this lacuna, we extend LDA by drawing topics from a Dirichlet process whose base distribution is a distribution over all strings rather than from a finite Dirichlet. We develop inference using online variati...

متن کامل

Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units

The recent emergence of Graphics Processing Units (GPUs) as general-purpose parallel computing devices provides us with new opportunities to develop scalable learning methods for massive data. In this work, we consider the problem of parallelizing two inference methods on GPUs for latent Dirichlet Allocation (LDA) models, collapsed Gibbs sampling (CGS) and collapsed variational Bayesian (CVB). ...

متن کامل

A scalable supervised algorithm for dimensionality reduction on streaming data q

Algorithms on streaming data have attracted increasing attention in the past decade. Among them, dimensionality reduction algorithms are greatly interesting due to the desirability of real tasks. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two of the most widely used dimensionality reduction approaches. However, PCA is not optimal for general classification pro...

متن کامل

Streaming Gibbs Sampling for LDA Model

Streaming variational Bayes (SVB) is successful in learning LDA models in an online manner. However previous attempts toward developing online Monte-Carlo methods for LDA have little success, often by having much worse perplexity than their batch counterparts. We present a streaming Gibbs sampling (SGS) method, an online extension of the collapsed Gibbs sampling (CGS). Our empirical study shows...

متن کامل

A scalable supervised algorithm for dimensionality reduction on streaming data

Algorithms on streaming data have attracted increasing attention in the past decade. Among them, dimensionality reduction algorithms are greatly interesting due to the desirability of real tasks. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two of the most widely used dimensionality reduction approaches. However, PCA is not optimal for general classification pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017